Repeated subsamples during DNA extraction reveal increased diversity estimates in DNA metabarcoding of Malaise traps

Abstract With increased application of DNA metabarcoding in biodiversity assessment, various laboratory protocols have been optimized, and their further evaluation is subject of current research. Homogenization of bulk samples and subsequent DNA extraction from a subsample of destructed tissue is a common first stage of the metabarcoding process. This can either be conducted using sample material soaked in a storage fixative, e.g., ethanol (here referred to as “wet” treatment) or from dried individuals (“dry”). However, it remains uncertain if perfect mixing and equal distribution of DNA within the tube is ensured during homogenization and to what extent incomplete mixing and resulting variations in tissue composition affect diversity assessments if only a fraction of the destructed sample is processed in the downstream metabarcoding workflow. Here we investigated the efficiency of homogenization under wet and dry conditions and tested how variations in destructed tissue composition might affect diversity assessments of complex arthropod samples. We considered five time intervals of Malaise trap bulk samples and process nine different subsamples of homogenized tissue (20 mg each) in both treatments. Results indicate a more consistent diversity assessment from dried material, but at the cost of a higher processing time. Both approaches detected comparable OTU diversity and revealed similar taxa compositions in a single tissue extraction. With an increased number of tissue subsamples during DNA extraction, OTU diversity increased for both approaches, especially for highly diverse samples obtained during the summer. Here, particularly the detection of small and low‐biomass taxa increased. The processing of multiple subsamples in the metabarcoding protocol can therefore be a helpful procedure to enhance diversity estimates and counteract taxonomic bias in biodiversity assessments. However, the process induces higher costs and time effort and the application in large‐scale biodiversity assessment, e.g., in monitoring schemes needs to be considered on project‐specific prospects.

current research. Homogenization of bulk samples and subsequent DNA extraction from a subsample of destructed tissue is a common first stage of the metabarcoding process. This can either be conducted using sample material soaked in a storage fixative, e.g., ethanol (here referred to as "wet" treatment) or from dried individuals ("dry"). However, it remains uncertain if perfect mixing and equal distribution of DNA within the tube is ensured during homogenization and to what extent incomplete mixing and resulting variations in tissue composition affect diversity assessments if only a fraction of the destructed sample is processed in the downstream metabarcoding workflow. Here we investigated the efficiency of homogenization under wet and dry conditions and tested how variations in destructed tissue composition might affect diversity assessments of complex arthropod samples. We considered five time intervals of Malaise trap bulk samples and process nine different subsamples of homogenized tissue (20 mg each) in both treatments. Results indicate a more consistent diversity assessment from dried material, but at the cost of a higher processing time. Both approaches detected comparable OTU diversity and revealed similar taxa compositions in a single tissue extraction. With an increased number of tissue subsamples during DNA extraction, OTU diversity increased for both approaches, especially for highly diverse samples obtained during the summer. Here, particularly the detection of small and low-biomass taxa increased. The processing of multiple subsamples in the metabarcoding protocol can therefore be a helpful procedure to enhance diversity estimates and counteract taxonomic bias in biodiversity assessments. However, the process induces higher costs and time effort and the application in large-scale biodiversity assessment, e.g., in monitoring schemes needs to be considered on projectspecific prospects.

K E Y W O R D S
arthropod metabarcoding, biodiversity, DNA extraction, monitoring
However, streamlining workflows is essential to ensure comparability of data from different studies and the use of metabarcoding for applied biodiversity monitoring (Bush et al., 2019;McGee et al., 2019;Pawlowski et al., 2018).
Since drying and homogenizing result in a fine powder, it requires more careful handing than wet material due to the increased chance of cross-contamination. Most DNA extraction protocols are limited to low amounts of starting tissue per reaction, and only a subsample of complete material is usually processed, ranging between 1 and 100 mg Marquina et al., 2019;Mata et al., 2021;Sire et al., 2022). Higher tissue volume during DNA extraction requires multiple reactions or more voluminous DNA extraction kits resulting in an increased effort and cost. However, DNA extraction from the subsampled tissue makes the assumption of perfect homogenization and equal distribution within storage tubes, and it remains unknown to what extent variation in tissue composition affects the assessment of species contained within bulk samples.
The effect of different extraction protocols has been examined in a number of studies, but the majority of which are either based on aquatic samples or do not include an evaluation of the pre-extraction steps (Majaneva et al., 2018;Mata et al., 2021;Pereira-da-Conceicoa et al., 2021). The material from these studies constitutes a lower diversity and biomass than typical Malaise trap samples, which means they cannot be directly compared as the latter samples require additional adjustments. Buchner and Leese (2020) investigated the overlap of species detection between subsamples of homogenized tissue obtained from Malaise traps; here, the authors focused on wet homogenization. In the current study, we provide a detailed examination of the effect dry homogenization and an examination into the different taxonomic groups in the assessed communities. We use five time-interval Malaise trap samples collected in a protected area in Germany and investigate the effect of homogenization strategy and tissue subsampling on biodiversity assessments. This study provides new insights in efficiency of extraction protocols for Malaise trap sampling and introduces a way of increasing diversity detection for tissue-based DNA metabarcoding approaches.

| Sampling
Samples were collected in the Nature reserve "Latumer Bruch" near Krefeld in Western Germany. All samples originate from one Malaise trap (51.326701 N, 6.632973 E). Detailed information about samples taken between May and July is given in Table 1.
Malaise trap sampling was conducted in a standardized manner based on the bicolored model by Henry Townes (Matthews & Matthews, 1983;Townes, 1972

| Laboratory work
Supernatant ethanol was removed, and each sample was separated into two size classes by sieving wet specimens through a 4 mm × 4 mm mesh with a wire diameter of 0.5 mm (untreated stainless steel). In the following, the size fractions will be referred to as either S (small, ≤4 mm) or L (large, >4 mm). Depending on sample volume, individuals of both size classes were transferred to either 30 ml tubes (Nalgene, wide-mouth bottle, polypropylene) or 50 ml  This included an additional homogenization step for those samples.
The former approach will be referred to as wet homogenization, while the latter approach (wet with additional dry homogenization) will be referred to as dry homogenization. Together with six negative controls (

| Data analysis
The quality of sequences delivered by Macrogen was determined through the program Fastqc (Andrews et al., 2012). Subsequent data processing was conducted using standard settings for all samples as implemented in JAMP v0.67 (https://github.com/Vasco Elbre cht/ JAMP). Paired-end reads were merged with vsearch v2.15.0 (Rognes et al., 2016). Cutadapt v3.4 (Martin, 2011) was used to remove primers and to discard sequences of unexpected length so that only reads with a length of 303-323 bp were used for further analyses.
All reads with an expected error >0.5 were excluded from further analysis. Sequences were dereplicated, singletons were removed, and sequences with ≥97% similarity were clustered into Operational Taxonomic Units (OTUs) using uparse. Chimera filtering was conducted using the uchime3_denovo option in vsearch. OTUs with a minimal read abundance of 0.003% per sample were retained for further analysis, and the program LULU was used for further qualitative filtering (Frøslev et al., 2017). Reads in negative controls were subtracted from according OTUs in samples (processed on the same extraction plate and sequenced on the same sequencing run). and Hymenoptera (L: 38%, S: 32.3%), the main proportion of reads was related to Diptera (L: 60.8%, S: 77%) and <10% to Hymenoptera (Table 2). This was most pronounced for size fraction S, where on average only 4.7% of the reads were assigned to this highly diverse order. A fixed threshold of 97% similarity was used for OTU clustering, and several OTUs show the same species-level assignments (Table S1). Further analyses were based on total OTU numbers, and no merging of molecular units with identical species-level assignment was conducted. We choose this diversity level to circumvent the merging of OTUs only assigned to higher taxonomic levels (as genus or family) since we have no information if those belong to the same species. Since the comprehensiveness of reference databases is strongly biased toward specific groups and shows huge gaps for others (Geiger et al., 2016), the lumping of OTUs based on assigned taxonomy could lead to a group-specific underestimation of diversity.
While different collection dates and the different size classes per sample showed distinct community compositions (Figure 2, p < .002), the treatment during homogenization did not affect sample ordination in NMDS analysis (p = .997, Figure 2). However, the average Jaccard dissimilarity between subsamples homogenized in dry condition (0.179 ± 0.06) was lower (p < .001) than dissimilarities between subsamples homogenized with wet treatment (0.207 ± 0.081) and when subsamples of one emptying date and size fraction were compared among homogenization approaches (0.206 ± 0.089, p < .001). Bray-Curtis dissimilarity was on average lower within dry (0.11 ± 0.07) than within wet (0.11 ± 0.09) homogenized subsamples and when subsamples between the two homogenization approaches were compared (0.13 ± 0.08, p < .01).
For samples processed under wet treatment, on average, 60.7% ± 8.5 of extrapolated total diversity could be detected in size fraction S and 75.8% ± 7.9 in size fraction L when only one subsample was processed in extraction (~20 mg of tissue, Figure 3) Table S1) for both homogenization approaches and all subsamples combined. Both size classes were processed and analyzed separately, here defined in column "Size", L = size fraction large including specimens >4 mm, and S = size fraction small including specimens ≤4 mm.
assessed with 12 ± 5.4 and 17 ± 3.4 subsamples. For samples homogenized by dry treatment, a single extraction (~20 mg) of size class S revealed 64% ± 2.3 of calculated species richness. In comparison, 79% ± 3.8 could be detected in 20 mg of size class L ( Figure 3). On average, 88.9% ± 1.6 of total diversity was assessed with the nine applied subsamples for size fraction S and 93.1% ± 3.8 for size fraction L. In contrast, 95% of total diversity would be calculated with 16 ± 2.1 (S, 320 mg) and 13 ± 7.1 (L, 260 mg) subsamples ( Figure 3).
For wet homogenization, detailed analysis of Diptera representatives revealed 70.6% ± 5.6 of calculated diversity in size fraction S and 77% ± 10.3 in size fraction L when only a single extraction subsample was processed. Detected richness increased to 92.1% ± 4.7 and 91% ± 5.6 when nine subsamples were processed (95% were reached with 14.2 ± 7.9 and 5 ± 1.9 subsamples). Additional dry homogenization processing of one tissue subsample revealed 70.6% ± 2.7 of extrapolated diversity for size fraction S and 81.1 ± 5.9 for size fraction L. With nine extraction subsamples, taxa detection increased to 91.2% ± 3.4 (S) and 93.3% ± 5.5 (L). Calculations revealed detection of 95% from total Diptera if 15.6 ± 5.4 (~320 mg) and 14.2 ± 10.2 (~200 mg) replicates were processed. For detailed information about observed and calculated species richness, see Figure 3.
Based on the accumulation of OTUs processing nine different subsamples (Figure 3), we calculated the proportion of additional OTUs with an increased number of subsamples ( Figure 4). For both treatments, the highest increase was observed when two instead of one subsample of size fraction S (≤4 mm) were processed. The increase was the highest for OTUs assigned to Hymenoptera (average distance wet: 11.4% ± 2.7, dry: 11.3% ± 1.2). With an increased number of subsamples, the distance in OTU number between samples decreased for all treatments and size fractions.

| DISCUSS ION
Homogenization of bulk samples and subsequent DNA extraction from destructed tissue is widely applied when insect biodiversity is assessed using DNA metabarcoding (Beermann et al., 2021;Hardulak et al., 2020;Mata et al., 2021). This destructive approach is known to be highly efficient for bulk sample analysis at a lesser cost compared with nondestructive protocols (Marquina et al., 2019;Persaud et al., 2021;Sire et al., 2022;Zenker et al., 2020). However, the application of DNA extraction from ethanol fixative or lysis buffer has the advantage of ensuring sample integrity, and the evaluation of various protocols is ongoing. Several studies demonstrate nondestructive approaches as promising alternatives for insect diversity assessment (Carew et al., 2018;Iwaszkiewicz-Eggebrecht et al., 2022;Nielsen et al., 2019;Svenningsen et al., 2021). A similar study based on DNA extraction from lysis buffer also demonstrated an increase in number of species detected by increasing sample volume (of buffer) during extraction . However, a comparison between destructive and nondestructive approaches is out of scope of the present study. Here, we set out to test different homogenization protocols and how subsampling of homogenized tissue affects diversity estimates of highly diverse Malaise trap samples.

| Comparison of different homogenization approaches
The average dissimilarity between reactions homogenized with dry treatment was lower than dissimilarity between subsamples processed with wet treatment for presence/absence analysis, which F I G U R E 2 Non-metric multidimensional scaling based on (a) Jaccard (presence/absence data) and (b) Bray-Curtis (abundance data) dissimilarity matrices. Samples include the nine subsamples per collection date (color coding), size fraction (S and L, marked in figure), and treatment during homogenization (shape coding). In addition, on average, a lower tissue weight per subsample was processed after wet homogenization (difference between dry and wet subsamples on average 3 mg). However, Jaccard's dissimilarity indices of subsamples that were homogenized wet were on average 0.21 ± 0.08, mainly due to high inconsistencies between subsamples from June 8th with an average dissimilarity of 0.35 ± 0.02 compared with dissimilarities between subsamples of the other collection dates (0.18 ± 0.06; Figure 2). The subsample inconsistency of this collection date (8th of June) is difficult to explain. These subsamples show medium diversity estimates and more diverse samples depict a higher similarity between subsamples (July). Since we did not investigate morphological features of detected taxa, taxonomic composition could be a factor for insufficient homogenization, impacting only the wet treatment. It also needs to be considered that for the present approach, DNA quantity after extraction was only measured through the band strength on an agarose gel (band visible for all samples), and that no adjustment of DNA quantity was conducted before PCR. Since subsamples were processed from identical samples, no significant differences in extraction success due to sample composition were expected. For comparing samples constituting highly different communities, the adjustment of DNA quantity might be an option; however, the different biomass of samples also needs to be considered.
The homogenization of dried samples includes drying for at least 48 h at temperatures around 50°C to guarantee the complete evaporation of ethanol from the sample, which increases processing time of the metabarcoding protocol. While the handling of dried powder appears more sensitive to cross-contamination than wet material processing, we could not support this with the analysis of negative controls (dry: 86 ± 82, wet: 520 ± 561). Here it needs to be considered that contamination risk was only assessed from extraction onward. . The x-axis describes differences between processed subsamples (e.g. 1-2 difference in relative number of OTUs detected with one subsample (~20 mg) compared with two subsamples (~40 mg)). (a) in ethanol circumvents this drying step. It is, therefore, more timeefficient and suitable for large-scale approaches as implemented in several studies on aquatic samples (Hajibabaei et al., 2012;Majaneva et al., 2018;Pereira-da-Conceicoa et al., 2021) and has been introduced as a scalable and cost-efficient protocol elsewhere (Buchner et al., 2021). The minor differences we observed between the two applied methods and the abovementioned experimental setup allow us to recommend homogenization of wet material for tissue-based DNA metabarcoding of Malaise trap samples as it reduces handling time and hence scalability of the protocol. Results indicate that an additional homogenization step of dried material, e.g., through bead grinding, should be integrated after the material has been subsampled to increase the fineness of material (Buchner et al., 2021). The most pronounced increase in OTUs with additional extraction subsamples was observed for the order Hymenoptera.

| Extraction subsamples during homogenization
Representatives of Diptera and Hymenoptera are the main targets in many Malaise trapping studies (Ssymank et al., 2018). However, as also indicated in previous studies, Diptera are present in much higher individual numbers, constituting a higher proportion of biomass in Malaise trap catches, while diversity of both groups is considered to be similar (Geiger et al., 2016). An underrepresentation of specific insect families, especially constituting taxa of low biomass, including small as well as rare insects, has also been reported in previous metabarcoding studies Krehenwinkel et al., 2017;Yu et al., 2012). This includes, for example, highly diverse parasitoid Hymenoptera, which are involved in important ecosystem functions but include many tiny species contributing only a minor fraction of tissue and consequently small amounts of DNA  (Shirazi et al., 2021).
In the present study, each of the nine subsamples with approximately 20 mg of homogenized tissue was processed during extraction.
It needs to be further investigated how the processing of 180 mg in one reaction (e.g., through a kit tolerating higher sample volumes) affects taxa detection. Here, technical details could influence the extraction of DNA molecules from the tissue, potentially reducing the concentration of rare molecules in final reactions or affecting the downstream laboratory protocol, e.g., through high DNA quantities in PCR reactions. Additionally, higher sequencing depth could increase taxa recovery and overlap between extraction subsamples Shirazi et al., 2021). Higher sequencing depth per sample can be reached by lower number of samples per sequencing run or more powerful sequencing platforms. This protocol adjustment also comes with a higher cost per sample, but is easier to adapt than processing multiple extraction replicates per sample.
We tested a sequencing depth of on average 114,394 ± 20,365 reads per sample (after quality filtering), and detailed analysis to understand the linkage between higher sequencing depth and replication strategy is beyond the scope of the present study. Again, it is also unclear how adjustments would affect taxa detection if extraction of higher tissue volume (e.g., 180 mg) in one reaction was conducted.
Further investigation could reveal an increase in sequencing depth as the most effective way to optimize taxon recovery under financial constraints, especially if one reaction of a high tissue volume is applied. Also, modifications in bioinformatic filtering can affect the detection of especially small and rare taxa, constituting low DNA amounts into the sample mixture. Quality filtering of raw sequence data as well as read assignments in OTU tables is essential to exclude PCR and sequencing errors as well as pseudogenes from the datasets and account for false-positive assignments (Andújar et al., 2020;Piper et al., 2019;Turon et al., 2020). However, quality control often implies abundance-based filtering, potentially excluding low-read assignments of taxa present in the sample. Evaluating the processing of multiple subsamples in combination with data filtering adjustments would give further insights on how to optimize metabarcoding workflows for high-resolution insect biodiversity assessment.
While the results presented here indicate an increase in insect diversity through processing a higher number of subsamples, especially for low-biomass taxa, it needs to be considered that results are based on different time intervals of a single Malaise trap. Spreading from the middle of May to the end of July, those samples cover different magnitudes of diversity, which is also accounted for by separate processing of the different size fractions. However, a sample basis of a single trap is insufficient to formulate final recommendations for standardized DNA extraction and metabarcoding of Malaise traps.
More work needs to be conducted across a variety of habitats to test the effect of different insect communities on species recovery.

ACK N OWLED G M ENTS
This work was supported by the Ministry for Environment,

Agriculture, Conservation and Consumer Protection of the German
State of North Rhine-Westphalia (No. III-1-620.08). We thank Dr.
Leighton Thomas and the three anonymous reviewers for their very helpful comments on the manuscript. Open Access funding enabled and organized by Projekt DEAL.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw sequences used in this study are available at NCBI SRA, accession number PRJNA883590.